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Abstract 

The number of uses for cutting-edge technologies has led to a further growth in a single chip's computational 
capacity. In this case, several applications want to build on a single chip for computing resources. As a result, 
connecting the IP cores becomes yet another difficult chore. The many-core System-On-Chips (SoCs) are 
being replaced by Network-On-Chip (NoC) as an on-chip connectivity option. As a result, the Network on 
Chip was created as a cutting-edge framework for those networks inside the System on Chip. Modern 
multiprocessor architectures would benefit more from a NoC architecture as its communication backbone. 
The most important components of any network structure are its topologies, routing algorithms, and router 
architectures. NoCs use the routers on each node to route traffic. Circuit complexity, high critical path latency, 
resource usage, timing, and power efficiency are the primary shortcomings of conventional NoC router 
architecture. It has been difficult to build a high-performance, low-latency NoC with little area overhead. This 
paper surveys previous methods and strategies for NoC router topologies and study of general router 
architecture and its components. Analysis is carried out to understand and work for a low latency, low power 
consumption, and high performance NoC router design that can be employed with a wide range of FPGA 
families. In the current work, we are structuring a modified four port router with the goals of low area and 
high performance operation. 

Keywords: System-On-Chip(SoC), Network-On-Chip(NoC), NoC router, Routing Algorithm, Switching 
method. 


1. Introduction 
Networks-on-Chip or NoC, have become 
increasingly important to VLSI development 


the establishment of data interchange within the 
chip through networking technology. Because 


recently. As integration levels increased, systems 
with various application kinds and unique I/O traffic 
characteristics emerged. Since the beginning of 
VLSI, the die area has been dominated by internal 
communication, which also controls clock speed 
and power usage. Bus utilization is becoming less 
popular, particularly in light of  single-die 
multiprocessor systems' increasing complexity. 
Consequently, the primary characteristic of NoC is 
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NoC's links can all be used for data transmission at 
the same time, it offers a high degree of parallelism 
and is a compelling alternative to conventional 
communication systems like point-to-point 
dedicated wires or shared buses [1]. With more and 
more cores on a single chip, network-on-chips 
(NoCs) technology is becoming essential for 
connecting these cores. NoCs have proven to be a 
crucial component of chip multiprocessors (CMPs), 
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connecting hundreds or even thousands of cores [2]. 
For many core system-on-chips, network-on-a-chip 
(NoC) provides an expandable and flexible inter- 
core communication infrastructure. Every node's 
router is used by NoCs to direct traffic. Generally, a 
baseline router pipeline has two stages [3]. NoC 
platforms are scalable and have the ability to keep 
up with the rate of technological advancements, in 
addition to their throughput. A graph with nodes 
representing processing elements and _ edges 
representing the connective relationships between 
the processing elements can be used to model a NoC 
network. The basic NoC architecture is shown in 
Figure 1 and consists mostly of the router and 
processing element (PE). Every PE has NI linked to 
it, which links it to a nearby router. A packet is 
transported via the choice made by each router, hop 
by hop, across the network, from a source PE to a 
destination PE. Router is the most crucial element in 
the construction of a NoC system's communication 


International Research Journal on Advanced Engineering Hub (IRJAEH) 


e ISSN: 2584-2137 
Vol. 02 Issue: 07 July 2024 
Page No: 1895- 1908 


https://irjaeh.com 
https://doi.org/10.47392/IRJAEH.2024.0260 


backbone, just like in any other network. The 
router's job in a packet-switched network is to either 
forward an incoming packet to another router that is 
linked to it, or to the destination resource if it is 
directly connected to it. Because implementation 
costs rise as a router's design complexity does, it is 
imperative that a NoC router's design be as 
straightforward as feasible. [4] However, significant 
inter-core communication latency might affect 
NoC-based frameworks due to the large number of 
routers a packet must traverse between a source and 
destination cores, as well as each individual router 
buffering. A router must complete multiple tasks in 
parallel, including route computation, VC 
allocation, and switch allocation, in order to reduce 
NoC communication delay while maintaining high 
throughput. On-chip systems still have difficulties 
in creating low-latency NoC routers, nevertheless 


[S]. 


Figure 1 Basic NOC Architecture 


Several virtual channels (VCs) are applied by 
current NoC routers to a single physical channel for 
various purposes, including as increasing system 
throughput and preventing deadlock in fully 
adaptive routing [6]. Resources are also separated to 
avoid application-level deadlock for various 
message classes [7]. QoS (quality of service) is also 
improved through the creation of virtual networks 
[8]. Because VC increases the complexity of the 
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router architecture, new VC allocation stages must 
be added to the current router pipeline stages. In 
NoC routers, the worm-hole flow control method is 
frequently used to lower on-chip memory 
consumption [9]. Wormhole flow management is a 
popular feature in NoC routers since SoCs have 
constrained space and power budgets. Wormhole 
reduces the amount of buffer needed by storing 
various fragments of the same packet in multiple 
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routers along the path. Aside from that, NoC routers 
usually use multiple VCs on a single physical 
channel because it provides a number of advantages, 
including boosting throughput by acting as escape 
routes for active packets to get around head-of-line 
(HoL) blocking, avoiding protocol- and network- 
level deadlocks, and creating VNs to support QoS 
for various applications. A worm hole router 
permits flits to be buffered in a flit serial fashion 
order through multiple routers along the path and 
splits a packet into multiple smaller flow control 
digits. Considerable effort has been made in the 
design of routers and routing topologies to offer 
minimal latency [10]. On the other hand, 
straightforward network topologies and routers are 
preferable because of the strict power and area 
constraints of a chip. Numerous on-chip network 
design enhancements have been put out in an effort 
to lower power consumption and _ boost 
performance. Numerous approaches have been put 
out to address a range of on-chip network concerns, 
such as switching and flow control mechanism 
challenges, routing microarchitecture design, 
mapping techniques, and layout designs. Flow 
control and buffering problems are critical to power 
reduction when taking into account the previously 
described enhancements [11]. 

1.1. Scope and Purpose of the Work 

NoC uses a variety of router topologies and 
depending on the concern of interest, router 
topologies are employed. The work on the various 
router topologies, that are used to route packets from 
one terminal to another efficiently are reviewed in 
this study. The Basic NoC Architecture is also 
covered in this work. Finally structuring the router 
design for four ports is analyzed with router 
architecture and functioning. 

2. Literature Review 

Wang et al. [12] studied the design of on-chip 
network microarchitectures from a power-driven 
perspective. This inspired for the study of power 
efficiency of an existing network architecture and 
suggest three power-efficient router 
microarchitectures it studied, assessing their impact 
on power, performance, and area using probabilistic 
analysis and comprehensive power modeling. After 
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that, the suggested network microarchitectures 
using both artificial and actual CMP benchmark 
traffic traces are assessed. When compared to a 
baseline network microarchitecture based on current 
on-chip network designs, the proposed network 
microarchitectures saved 44.9% of power with 
uniform random traffic and 37.9% with TRIPS 
CMP traces. This significant power savings is 
achieved without compromising — network 
performance, and in certain situations, performance 
is significantly enhanced. Mullins et al. [13] 
presented the design of a low-latency on-chip 
network router. In order to reduce cycle-time and 
latency, control overheads (routing and arbitration 
logic) from the crucial path are eliminated. 
Simulations show that significant increases in cycle 
time can be achieved without sacrificing router 
efficiency. Additionally, by allowing flits to be 
routed in a single cycle, these reductions maximize 
the efficiency of the router's constrained buffering 
resources. According to simulation studies, there is 
a significant reduction in the critical path without 
sacrificing router efficiency. The main parts of the 
router have been roughly laid out, and a 0.18 
micrometer VLSI implementation with a 1.2GHz 
frequency has been scheduled. A unique grid-based 
distributed clocking system is used to support the 
design and guarantee that there is little skew 
between neighboring routers. Salah et al. [14] 
designed a_ scalable packet-based router 
architecture. These designs are made up of a m*n 
mesh that facilitates data flow and dynamically 
controls a number of switches that connect 
computing resources (IPs) and _ arrange 
communications in parallel. The switches and the 
proposed router are discussed in. This topology, 
which is common in VHDL at the RTL level, was 
simulated in the context of 2D topologies literature 
due to its simplicity in terms of XY mesh 
implementation and 2D torus (2x2), (3x3), and 
(4x4). The network's scalability and design routing 
method were applied. A specific amount of electric 
simulation and synthesis tools are eliminated 
between switches in the short distance technique, 
which is based on VHDL as a description language. 
Kale et al. [15] presented the on-chip router 
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architecture, which minimizes silicon area and 
reduces power consumption. With the use of 
Active-HDL and Quartus II web edition, synthesize 
and simulation of a five-port router after is done 
after simulating the aforementioned unidirectional 
router. This aids in comprehending the correct 
operation of the five-port router for a network on 
chip. Shrivastava et al. [16] proposed a low-area, 
low-power NoC architecture that does away with 
virtual channels. Elastic buffers take the place of 
buffers. To obtain the benefits of buffered and 
buffer-less cross bars, the cross bar is divided into 
two sections. When compared to a base line router, 
the suggested router's area is reduced by 47.89% and 
its power consumption is lowered by 11.2% in 
Micro Wind 3.5. Ghorse et al. [17] presented a 
promising architectural option for upcoming 
systems on chips, the NoC (Network-on-Chip) chip 
design paradigm. Switch allocation and _ virtual 
channel allocation in the router are dependent on 
one another. This dependence is eliminated by an 
efficient virtual channel router, which uses 
speculation to carry out these two tasks 
simultaneously. This allows the gadget to operate at 
a greater frequency due to parallelism. This method 
lowers the on-chip routers’ clock cycle considerably. 
Simulation results demonstrate that by carrying out 
these two tasks (VC allocation and SA) in 
simultaneously, the critical path is greatly shortened 
without sacrificing router efficiency. The 1074 flip- 
flops that are utilized in this router are a lot more 
than in other router architectures, but they are used 
at the highest frequency possible to improve speed 
and decrease network latency. Poluri et al. [18] 
suggested a dependable NoC router architecture that 
can withstand numerous persistent errors. In order 
to improve fault tolerance, minimum corrective 
circuitry after carefully analyzing each pipeline 
stage of a NoC router is incorporated. When 
compared to other fault-tolerant routers or the 
baseline NoC router, the suggested router provides 
higher dependability without consuming excessive 
space or power. According to reliability studies 
utilizing Mean Time to Failure (MTTP), the 
suggested router has six times the reliability of the 
NoC router used as a baseline (without protection). 
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Using the Silicon Protection Factor (SPF) as a 
metric, proposed router is also compared with other 
fault-tolerant routers. Hardware synthesis using a 
commercial 45nm technology library from Cadence 
Encounter RTL Compiler reveals that the correcting 
circuitry has a 31% area overhead and a 30% power 
overhead. Nasirian et al. [19] proposed an adaptive 
routing strategy for network-on-chip (NOC) routers 
that is power-efficient. In a power-gated network, 
the proposed strategy routes traffic based on the 
router's status. By switching the pathways, we can 
prevent the routers that are in the sleep state from 
being turned on. According to the simulation, the 
average latency is improved by 35% when 
compared to conventional power-gated design and 
achieve a nearly 80% reduction in static power 
usage when compared to non-power-gated design. 
Monemi ef al. [20] described a two-clock-cycle 
delay router micro-architecture with parallel switch 
allocator and VC _ using Request Masking 
Technique. The architecture does away with the 
requirement to give any IVC request a higher 
priority. Any request that the switch allocator has 
approved for service can successfully pass a flit to 
the output port thanks to the NoC router 
architecture. All switch allocation requests that are 
unable to pass flits to the output port—either 
because there is not enough free space in the 
assigned VC or because there is not enough free VC 
in the output port for no assigned VC requests—are 
to be filtered using an effective masking approach. 
Additionally, the masking technique makes 
effective use of the VC memory buffers. The 
suggested method barely affects a NoC router's area 
overhead and timing. Deivakani et al. [21] proposed 
a low-power router design that uses on-chip wireless 
communication as express links for data transfer 
across subnet routers. The hybrid NoC router's 
average packet latency and normalized power 
consumption under four synthetic traffic loads: 
shuffle, bit comp, transpose, and bitrev are 
examined. When compared to a wired NoC, the 
suggested hybrid NoC router reduces normalized 
power by 12.18% for consumer traffic, 12.80% for 
auto-industrial traffic and 12.5% for MPEG? traffic. 
Moreover, the Network Simulator 2 tool is used to 
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analyze the performance in real-time traffic 
scenarios. Verma et al. [22] suggested the switching 
wormhole concept to launch a router. The router is 
regarded as a crucial part of an on-chip network. The 
router, together with its elements and characteristics 
that impact the overall design, is proposed in this 
paper. Round-robin routing is used to create routers 
on FPGA platforms. The suggested router observes 
the FIFO of input ports in each clock cycle, and each 
input port priority is dynamically adjusted. The 
architecture makes sure that every input port is fairly 
served. In order to verify the NOC's hardware 
functionality, a router has been created using VHDL 
and simulated using Xilinx ISE 14.1, with the 
XC5VLX30-3 FPGA as the target. Karthikeyana et 
al. [23] proposed a three-dimensional lottery routing 
system that relied on an arbitration mechanism, such 
as a randomly prioritized buffer is introduced. 
Through the lottery router, users can configure 
communication between the IPs in the NoC. The 
lottery routing method determines which input port 
has a greater priority and ensures that it responds to 
that port. Using the Xilinx Spartan 3E FPGA, an 
effective hardware implementation of a 3D NoC 
was developed. The architecture runs at a maximum 
frequency of roughly 103.602MHz and _ utilizes 
1644 slices out of 4656 slices. Compared to a single 
layer, the 3D NoC's power usage was 9% lower. 
According to Shen et al. [24] in the Journal of 
Computer Science and Technology, "An Efficient 
Network-on-Chip Router for Dataflow 
Architecture," an effective NoC router for dataflow 
architecture was developed. The router can transport 
data to several locations in a single transfer since it 
supports multiple destinations. In addition, the 
router uses non-flit packets to reduce transfer 
latency and output buffers to maximize throughput. 
By using this technology, dataflow architecture 
performs 3.6 times better than a cutting-edge router. 
Cunlu Li et al. [25] developed a mechanism that 
makes use of reorder buffer (RoB) techniques to 
schedule packets in input buffers. The virtual 
channels, or VCs, were intended to be RoBsto 
permit the allocation of packets that are not at the 
head of a VC before the head packets. RoBs lessen 
switch allocation conflicts, lessen HoL blocking, 
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and enhance NoC performance as a result. 
Afterwards, the RoB-Router was presented, which 
makes use of elastic RoBs in VCs to restrict the 
amount of a VC that can function as a RoB. The 
length of RoB in a VC is automatically calculated 
by RoB-Router using the number of buffered flits. 
The design achieves exceptional efficiency while 
minimizing resources. Moreover, two _ other 
strategies were put out to enhance RoB-Router's 
functionality. One was to alter the VC allocation 
strategy in order to optimize the packet order in 
input buffers. The other combines the most effective 
switch allocator available today, TS-Router, with 
RoB-Router. Compared to TS-Router, the approach 
improves packet delay under simulated traffic and 
traces from PARSEC by 46% and 15.7%, 
respectively, with a moderate energy and area cost. 
The Journal of Supercomputing, 2018 by Su et al. 
[26] described a highly efficient dynamic router for 
application-oriented networks on chips. They 
suggested an efficient router architecture with intra- 
port and inter-port allocation mechanisms to boost 
network performance. Without compromising 
network performance, the new router architecture 
can increase buffer use by changing the virtual 
channel unit of NoC's routers. The head of line 
blocking issue can be resolved by the router by 
utilizing the concept of virtual output queues 
(VOQ). Additionally, it has the ability to 
dynamically distribute the traffic load among 
several ports on the application-focused NoC. 
Muhammad Rashid et al. [27] presented effective 
methods for enhancing a NoC router's resilience 
against persistent errors. The input port in the 
suggested architecture makes use of VC closing 
methods, virtual channel (VC) queuing, and bypass 
paths. Additionally, spatial redundancy and double 
routing techniques are used in the routing 
computation step, and spatial redundancy is used in 
the VC allocation stage. Run-time arbiter selection 
is used in the switch allocation stage. Three bypass 
buses are used by the crossbar stage. When 
compared to the most advanced fault-tolerant 
routers already in use, the suggested router has a 
high level of fault tolerance. The suggested router 
uses 28% more power and 26.6% more space than 
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the baseline router, according to the hardware 
synthesis results. But, in terms of the mean-time-to- 
failure metric, the proposed router's reliability is 
7.98 times higher than that of the unprotected 
baseline router. Smitha H N [28] presented a FPGA 
based reconfigurable router with low power 
consumption and good performance for use in 
NoCs. The four channels (east, west, north, and 
south) plus the crossbar switch make up the 
suggested router design. Every direct has 
multiplexers to regulate the data input and output as 
well as First in First out (FIFO) buffers to store the 
information. A FIFO buffer's stack width is seen as 
three and its stack height as four. It suggests that it 
comprises four sections, each of which having the 
capacity to hold three pieces of data. There are five 
multiplexers per channel. Three multiplexers are 
used to regulate the read and write processes of the 
FIFO, while two multiplexers are used to control the 
input and output of information. This router uses 
System Verilog for its structural passage. 
MODELSIM EDITION 10.4a and vivado are used 
separately for synthesis and simulation. FPGAs 
from Xilinx SPARTAN-6 are used to generate the 
suggested reconfigurable router. With the aid of an 
XPower analyzer device, absolute power is 
ascertained following the simulation and synthesis 
of the suggested router architecture. Akhtar et al. 
[29] designed the reconfigurable router that is 
required for use in the NOC achieving High 
framework execution and Low power consumption. 
According to this paper, the FIFO buffer's stack 
profundity is 16, meaning that each memory area 
can hold eight bits of data. There are five 
multiplexers available for every channel. Out of the 
five multiplexers, two are employed for managing 
the information and yield signals. The read and 
compose operations of FIFO are managed by the 
remaining three multiplexers. Verilog HDL is used 
to implement the suggested router design. 
Reproduction is done with Modalism, and blending 
is done with Xilinx ISE Design Suite 14.5. The 
union of reconfigurable router is done with Xilinx 
SPARTAN-6 FPGAs. After replicating and 
combining the suggested router, the entire power 
computation is completed using the Xpower 
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Analyzer program. The router construction 
represents a total power of 20 milliwatts. In this 
configuration, the power has been reduced by about 
6mw.Dharmale et al. [30] presented the architecture 
and characteristics of NoC. The study provided a 
First in First Out (FIFO) buffer-based design for on- 
chip routers. By employing an effective flow control 
strategy that makes use of the storage currently 
available in pipelined channels in place of an 
explicit input virtual channel buffer, the prior 
approach does away with the idea of virtual 
channels (VCs). Compared to generic, this design 
can save up to 66.66% of power and reduce delay 
by 99.80%. Compared to virtual channel routers, it 
can save up to 60.48% of power and reduce delay 
by 90.88%. Savithri et al. [31] built the 1X4 tree 
topology NoC router and simulated using Verilog 
HDL and implemented on a Spartan3 Xc3s400 
FPG.A novel paradigm for creating connections 
inside a System on Chip (SoC) is called Network on 
Chip. In SoC, bus architectures are utilized to create 
connections. Bus structure will not suffice for new 
technology; as integration grows, it gets narrower 
and, in the worst scenario, starts to obstruct traffic. 
The goal of NoC technology is to address the 
shortcomings of buses. In NoC technology, the bus 
structure is replaced by the network. Blocks use this 
network to exchange messages with one another and 
send packets of data. The routers in a NoC network 
are responsible for routing data packets, while the 
cables link devices to other routers and other routers 
to devices. Routers are connected to processors, 
memory, and other IP blocks, also known as 
processing elements (PE). An effective NoC 
architecture requires careful design of the router, 
which is a key component. The computing demands 
of the applications are driving an exponential 
increase in the number of cores in_ the 
multiprocessor system on the chip. Effective 
communication amongst processors on a chip has a 
significant effect on the power, speed, and space 
requirements of the on-chip multiprocessor system. 
Three-dimensional (3D) Optical Network-on-Chip 
(ONoC) is a viable approach to address such 
complicated integrated interconnect technology 
systems. Being the central component of 3D ONoC, 
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the optical router requires an optimal router design 
with respect to component count, insertion loss, 
power consumption, and other factors. Jadhav a et 
al. [32] proposed a unique design for an inter-layer 
(vertical) optical router and a 6 x 6 intra-layer non- 
blocking optical router using a micro-ring resonator 
(MRR). The Phoenix simulator is used to do the 
performance analysis. Comparing the proposed 6 x 
6 optical routers to the current 3D ONoC non- 
blocking optical routers, the latter has the fewest 
waveguide bendings and crossings. Furthermore, 
there are fewer waveguide crossings thanks to the 
suggested inter-layer optical router. When 
compared to benchmarks, the ONoC with X-Mesh 
topology and the design performs better in terms of 
insertion loss and signal to noise ratio [32]. Melvin 
T et al. [33] discussed the design and 
implementation of a Congestion Aware NoC router 
utilizing Vivado HLS. The router is then utilized to 
create a mesh-based, scalable NoC. s a test bed, the 
NoC is used to run simulations and estimate 
performance parameters such as latency, waiting 
time, and total packet processed for different NoC 
configurations. Additionally, options to change 
parameters like traffic, packet injection interval, 
buffer depth, and packet size have been added. 
Additionally, a straightforward method for 
identifying congestion at the router is suggested. 
Next, the XY dimension order routing is modified 
into a congestion metric. Reduced hardware 
overhead minimum adaptive X/Y routing scheme. 
With respect to various parameter modifications, the 
suggested routing technique is contrasted with 
traditional XY DOR, GCA routing, and RCS-based 
routing algorithms. According to the findings, at 
medium packet injection rates, the suggested 
routing strategy can lower packet latency for a 
variety of traffic patterns. Samanth et al. [34] tried 
to design and analyze the router, which is mostly 
dependent on a number of crucial elements like 
topology, routing algorithm and format of packet. 
Regarding power and slack time (ST), the 
recommended router's performance is assessed and 
contrasted with that of the Conventional Wormhole 
Router (CWHR). The suggested router architecture 
has 18% less slack than a conventional wormhole 
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router. This leads to a 29.3% reduction in overall 
power usage when compared to conventional 
wormhole architecture. Using standard 65nm 
technology, the architecture is synthesized using the 
Cadence Encounter RTL Compiler. T.K. Ramesh et 
al. [35] first presented a router called URR-Router, 
which effectively gives uniform requests a priority 
by allocating EPC requests only in the absence of 
uniform requests in the SA stage. The EPR-Router, 
which prioritizes EPC requests over uniform ones, 
reduces endpoint congestion, is the second example. 
In this study, a sequence of priority modifications 
between uniform requests and EPC _ requests, 
offering a new perspective on the SA process 
optimization that is the foundation of the low- 
latency CUE-Router architecture is performed. 
CUE-Router facilitates communication between the 
pipeline's various stages, which significantly 
increases network performance and SA efficiency. 
CUE-Router outperforms TS-Router in terms of 
overall NoC performance by 8.10% with synthetic 
traffic and 8.75% on average with application-level 
traffic. Compared to the basic router, the CUE- 
Router lowers leakage power consumption by 3.7 
percent and overhead by 4.1 percent. M. N. Saranya 
et al. [36] carried out design and functional 
verification of an asynchronous NoC router 
microarchitecture are covered in this study. First, 
the research uses the commercially available 
Spectre Analog and mixed-signal simulation (AMS) 
Designer tool to present a unique mixed-level 
abstract simulation approach for speedier functional 
verification of the asynchronous architecture. The 
purpose of this simulation methodology is to verify 
the design's viability and pinpoint any flaws before 
the design's later implementation stages. The 
research also suggests a brand-new baseline 
asynchronous router with a unique hybrid encoding 
method that is based on a domino logic pipeline 
template. Simple architecture is made possible by 
the new hybrid encoding approach, which does not 
impose any extra temporal restrictions. The 
functional verification of the baseline asynchronous 
router using Cadence's AMS designer tool is 
assessed by the suggested verification approach. 
Initial simulation findings are in line with the 
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paper's goals. Additionally, at later phases of the 
design implementation, the design validation is 
established by the same verification setup. 

3. NOC Router Design 

A NoC router consists of four pipeline stages that 
are sequential. (1) Route computation: This is the 
first stage in which the output port to which a packet 
needs to be transmitted is determined. (2) VC 
allocation: This is the second stage in which an 
empty VC is assigned in the nearby router that is 
attached to the output port. Arbitration is necessary 
because multiple header flits may transmit requests 
for the same VC. The header flit is all that is needed 
for the VC allocation and routing computation. The 
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corresponding header flit will be followed by the 
body and tail flits. (3) Switch allocation: The third 
stage asks the switch allocator to assign the output 
port if the VC _ allocation is _ successful. 
(4) Switch traversal: In the third stage, the flit will 
be sent to the crossbar and delivered to the 
destination if the switch allocation is successful. A 
NoC router architecture is shown in Figure 2. Table 
1 shows Summary of different types of router design 
methodologies. 


Table 1 Summary of Different Types of Router Design Methodologies 


SL. Router 
No. type 


Process 


Contribution Ref. 


: A Reorder packets in input buffers 
Router for 


low Latency as a RoB. 


1 RoB-Router | e Reorder buffer (RoB) techniques to schedule | e RoBs lessen switch allocation | [25] 


Buffer e RoB-Router makes use of elastic RoBs in VCs 
to restrict the amount of a VC that can function 


e Alter the VC allocation strategy in order to 
optimize the packet order in input buffers and 
switch allocator TS-Router, with RoB-Router. 


conflicts, lessen HoL 
blocking, and enhance NoC 
performance as a result 

e Compared to TS-Router, the 
approach improves packet 
delay by 45% and traces from 
PARSEC by 15.7%, 

e Demerits: Moderate energy 
and area cost. 


Heterogene | e Run-time arbiter selection 


2 Dynamic e Intra-port and inter-port allocation e Balance of traffic, | [26] 
router for | e Change of virtual channel unit Optimization of delay and 
application- | e Virtual output queues (VOQ) Throughput improvement 
oriented e Dynamically distribute the traffic load e Demerits: Buffer utilization 
NOC increased by 21.8 % 

3 Fault- e VC closing methods, virtual channel (VC) | ¢ High level of fault tolerance. | [27] 
Tolerant queuing, and bypass paths. e Demerits: The suggested 
NOC e Spatial redundancy and double routing router uses 28% more power 
Router for techniques and 26.6% more space than 


the baseline router, according 


able Router pieces of data. 
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ous e Three bypass buses in crossbar to the hardware synthesis 
Computing results. 
Systems 

4 FPGA e A FIFO buffer's stack width is seen as three and | e Low power consumption and | [28] 
based its stack height as four. good performance 


Reconfiguar | e Comprises four sections each holding three | e With the aid of an XPower 


e Three multiplexers: regulate the read and write 
processes of the FIFO, two multiplexers: 
control the input and output of information. 


analyzer device, absolute 


power is ascertained 
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e System Verilog for the structural passage 
e Power gating technique 

> Low Power | e FIFO buffer's stack profundity is 16 High framework execution | {29| 
Reconfigura | e Out of the five multiplexers, two are employed and Low power consumption 
ble Router for managing the information and yield signals. Incurs a total power of 20 

e The read and compose operations of FIFO milliwatts. In this 
management: other three multiplexers configuration, the power has 
e Design: Verilog HDL been reduced by about 6mw. 
e The union of reconfigurable router is done with 
Xilinx SPARTAN-6 FPGAs. 

6 Efficient e Light weight parallel router architecture Router has been tested under | [30] 
NOC e Decoding logic and FSM optimizations various conditions using FSM 
Router e Adaptation of the single crossbar description control logic 

e Store-and-forward switching Compared to generic, this 
e Low-overhead link-level flow control is design can save up to 66.66% 
provided by a handshaking signal of power and reduce delay by 
99.80%. 
Compared to virtual channel 
routers, it can save up to 
60.48% of power and reduce 
delay by 90.88%. 

t Design of | e Packet based architecture 1 X 4 Router is designed and | [31] 
Four Port | e 1X4 tree topology NoC router verified in Verilog and 
Router for] e Simulation using Verilog HDL implementation done on 
Network on | ¢ Implementation on a Spartan3 Xc3s400 FPGA Spartan 3 FPGA. 

Chip Demerits: Not suitable for 4 X 
4 Router 

8 Non- e Unique design for an inter-layer (vertical) Waveguide crossings are | [32| 
blocking optical router and a 6 x 6 intra-layer non- reduced. 
optical blocking optical router using a micro-ring Design performs better in 
router for resonator (MRR). terms of insertion loss and 
3D optical | e ONoC with X-Mesh topology signal to noise ratio 
NOC e Phoenix simulator for performance analysis. Minimized size and power 

consumption 

9 Congestion | e Congestion Aware NoC router utilizing Vivado Routing technique is | [33] 
aware router HLS contrasted with traditional XY 

e Mesh-based, scalable NoC DOR, GCA routing, and RCS- 
e XY dimension order routing is modified into a based routing algorithms. 
congestion metric According to the findings, at 
e Reduced hardware overhead minimum medium packet injection rates, 
adaptive X/Y routing scheme the suggested routing strategy 
can lower packet latency for a 
variety of traffic patterns. 

10 | Four Port | e Asynchronous FIFO The suggested router | [34] 
Router e Round-Robin Arbiter architecture has 18% less 

e Depending on data size,packet format decided slack than a conventional 

e XY routing algorithm wormhole router. 

e Mesh topology This leads to a 29.3% 
reduction in overall power 
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usage when compared to 
conventional wormhole 
architecture. 

11 | AS(Allocati | e URR-Router, gives uniform requests a priority CUE-Router outperforms TS- | [35] 
on service) - by allocating EPC requests only in the absence Router in terms of overall NoC 
Router of uniform requests in the SA stage. performance by 8.10% with 

e The EPR-Router, which prioritizes EPC synthetic traffic and 8.75% on 
requests over uniform ones, reducing endpoint average with application-level 
congestion traffic. 

e Sequence of priority modifications between Compared to the basic router, 
uniform requests and EPC requests the CUE-Router lowers 

leakage power consumption 
by 3.7 % and overhead by 4.1 
% 

12 | Design and | e Spectre Analog and mixed-signal simulation Switch performance | [36] 
Verification (AMS) Designer tool bottleneck can be found early 
of an |e Verify the design's viability and pinpoint any in the design process by using 
Asynchrono flaws before the design's later implementation the simulation. 
us NoC stages As a result, the design work 
Router e Baseline asynchronous router can be directed effectively in 
Architectur the upcoming phases of 
e for GALS execution. 

Systems 
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Figure 2 NoC Router 
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Input ports, VC/SW allocators, routing computation 
module, and a crossbar make up the NoC router. The 
input ports communicate requests to the allocators 
and buffer input flits. The routing algorithm is used 
by the routing computation module to identify the 
output port. Following route computation, a request 
is sent to the VC allocator to assign a free output VC 
(OVC) in the subsequent router to the input VC 
(IVC). The switch allocator will receive another 
allocation request if an OVC is successfully 
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assigned. In the event that the switch allocation 
request is approved, the crossbar is then set up to 
deliver the intended flit to the output port. [37] The 
available space in the next router buffer needs to be 
known in order to send requests to the switch 
allocator. As a result, output port modules track the 
amount of buffer space that is available for each 
OVC by maintaining a set of credit counters. Figure 
3 shows Router Architecture proposed by Ref. 


C rowers wc tetoctere 


Since the allocator will determine the overall NoC 
router performance and area overhead, it is the most 
difficult module to design. Additionally, the NoC 
critical path contains allocators. When many agents 
(IVCs) need to access multiple resources (OVCs or 
output ports) at the same time, an allocation is 
needed. Making the Allocator adaptive will ease the 
task and design results in performance 
improvement. The router design adopts 4*4Mesh 
topology. The coding will be performed in Verilog 
language, design implemented on FPGA and 
Verification in Model Sim. Use of Modified XY 
Routing Algorithm will minimize the area 
consumption of Table 2. 


Figure 3 Router Architecture Proposed by Ref. [37] 


Table 2 Summary of total area occupied [Ref. 


34] 
Cells Cell Total Area (tmm2) 
area 
Proposed 11283 131578 
Router 131578 
Integration 45132 526313 526313 
(2x2 Router) 
Conclusion 


The standardization of NOC _ for On-Chip 
communication is being driven by the growing 
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number of IPs on tiny chips. NOC needs to take care 
of issues like fault and congestion in order to 
provide reasonable performance. The _ study 
provides an overview of design of various NoC 
router architectures with respect to hardware use, 
including memory blocks, logic cells, and 
maximum frequency to improve performance. This 
analysis helps in understanding the basic concepts, 
shortcomings that are to be dealt with to aim for 
better and low area router design in future. In the 
current work, we are structuring a modified 
algorithm with the goals of low area and high 
performance operation for the 4 X 4 mesh topology. 
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